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Abstract 

We derive non-asymptotic confidence regions for the mean of a ran- 
dom vector whose coordinates have an unknown dependence structure. 
The random vector is supposed to be either Gaussian or to have a sym- 
metric bounded distribution, and we observe n i.i.d copies of it. The 
confidence regions are built using a data-dependent threshold based 
on a weighted bootstrap procedure. We consider two approaches, the 
first based on a concentration approach and the second on a direct 
boostrapped quantile approach. The first one allows to deal with a 
very large class of resampling weights while our results for the second 
are restricted to Rademacher weights. However, the second method 
seems more accurate in practice. Our results are motivated by multi- 
ple testing problems, and we show on simulations that our procedures 
are better than the Bonferroni procedure (union bound) as soon as the 
observed vector has sufficiently correlated coordinates. 
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1 Introduction 



In this work, we assume that we observe a sample Y := (Y 1 ,...,Y n ) of 
n > 2 i.i.d. observations of an integrable random vector Y* G R x with a 
dimension K possibly much greater than n. Let /i G M. K denote the common 
mean of the Y* ; our main goal is to find a non-asymptotic (1 — a)-confidence 
region for /i , of the form: 

{x G R K s.t. (Y-x) < t a (Y)} , (1) 

where 4> : M. K — > R is a measurable function fixed in advance by the user 
(measuring a kind of distance), a G (0, 1), t a : (M. K ) — ► R is a measurable 
data-dependent threshold, and Y = - Y^7=i ^ ^ s ^ ne empirical mean of the 
sample Y. 

The form of the confidence region (DO) is motivated by the following multi- 
ple testing problem: if we want to test simultaneously for all 1 < k < K the 
hypotheses Hq^ = < 0} against Hi ik = {Hk > 0}, we propose to reject 
the Ho,k corresponding to 

{1 < k < K s.t. Y fe > t a (Y)}. 

The error of this multiple testing procedure can be measured by the 
family- wise error rate defined by the probability that at least one hypothesis 
is wrongly rejected. Here, this error will be strongly (i.e. for any value of /i) 
controlled by a as soon as the confidence region (Op) for \i with <fi = sup(-) is 
of level at least 1 — a. Indeed, for all /i, 

P (3k s.t. Y k > t a (Y) and fj, k < 0) < P (3k s.t. Y k - fi k > t a {Y)) 

= P ( sup {Y k - fi k } >t a (Y) 
V k 

The same reasoning with = sup |-| allows us to test H 0jk = {jj, k = 0} against 
H\ k = {fi k 7^ 0}, by choosing the rejection set {1 < k < K s.t. lYj > 
t a (Y)}. 

While this goal is statistical in motivation, to tackle it we want to follow 
a point of view inspired from learning theory, in the following sense: first, we 
want a non-asymptotical result valid for any fixed K and n, and secondly, 
we want to make no assumptions on the dependency structure of the coor- 
dinates of Y l (although we will consider some general assumptions over the 
distribution of Y, for example that it is Gaussian). 

The ideal threshold t a in is obviously the 1 — a quantile of the dis- 
tribution of 4> (Y — fu . However, this quantity depends on the unknown 
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dependency structure of the coordinates of Y* and is therefore itself un- 
known. 

We propose here to approach t a by some resampling scheme: the heuris- 
tics of the resampling method (introduced by Efron [EfLlQj) is that the dis- 
tribution of Y — n is "close" to the one of 

1 n i n 

Y [w~w] ■= ~ £ W - W)Y> = - £ Wi(Y { - Y) = W^Y) [W] , 

i=l i=l 

conditionally to Y, where (Wi)i<i< n are real random variables independent 
of Y called the resampling weights, and W = n -1 Y17=i ^ ■ We emphasize 
that the family {Wi)i<i< n itself need not be independent. 

Following this idea, we propose two different approaches to obtain non- 
asymptotic confidence regions in this paper: 

1. The expectations of (p (Y — fi) and <\> ^Y^ ppiJ can be precisely com- 
pared, and the processes cf> (Y — /i) and E <p ^Y^ ^j^ |Y concen- 
trate well around their expectations. 

2. The 1 — a quantile of the distribution of <\) (y^ w J conditionally to 
Y is close to the one of (Y — fi) . 

Method 1 above is closely related to the Rademacher complexity approach 
in learning theory, and our results in this direction are heavily inspired by 
the work of Fromont |Fro04| . who studies general resampling schemes in a 
learning theoretical setting. It may also be seen as a generalization of cross- 
validation methods. For method 2, we will restrict ourselves specifically to 
Rademacher weights in our analysis, because we use a symmetrization trick. 
Although this kind of method is not new in the resampling literature, to our 
knowledge our result is the first to provide a non-asymptotic analysis based 
on empirical resampled quantiles. 

Let us now define a few notations that will be useful throughout this 
paper. 

• Vectors, such as data vectors Y* = (YjL) will always be column 
vectors. Thus, Y is a K x n data matrix. 

• If fi G K , Y — /i is the matrix obtained by subtracting /i to each 
(column) vector of Y. If c G R and W e M n , W - c = (W l - c)^^ G 
R n . 

• $ is the standard Gaussian upper tail function. 
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Several properties may be assumed for the function (j> : M — > M: 

• Subadditivity: Vx, a' G (x + x') < <j>{x) + <p (a/) . 

• Positive-homogeneity: Vx G M. K , VA G K+, <p(Xx) = Xcf)(x) . 

• Bounded by the p-norm, p G [l,oo]: Vx G M^, |0(x)| < ||x|| , where 
||x|| is equal to (J2k=i \ x k\ P ) 1 ^ p if P < 00 an( l niaxfelkfcl} otherwise. 

Finally, different assumptions on the generating distribution of Y can be 
made: 

(GA) The Gaussian assumption: the Y l are Gaussian vectors 

(SA) The symmetric assumption: the Y* are symmetric with respect to \i 
i.e. Y l — n ~ fx — Y l . 

(BA)(p, M) The bounded assumption: ||Y* — fj,\\ < M a.s. 

In this paper, our primary focus is on the Gaussian framework (GA), because 
the corresponding results will be more accurate. 

The paper is organized as follows: Section 2 deals with the concentration 
method with general weights. In Section 3, we propose an approach based on 
resampling quantiles, with Rademacher weights. We illustrate our methods 
in Section 4 with a simulation study. The proofs of our results are given in 
Section 5. 

2 Confidence region using concentration 

In this section, we consider a general M n -valued resampling weight vector 
W , satisfying the following properties: W is independent of Y, for all i G 
{1, . . . , n} E [W^] < 00 , the (Wj)i<j< n have an exchangeable distribution (i.e. 
invariant under every permutation of the indices) and the coordinates of W 
are not a.s. equal, i.e. ElWi — W\ > 0. Several examples of resampling 
weight vectors are given in Section 12.31 where we also tackle the question of 
choosing a resampling. 

Four constants that depend only on the distribution of W appear in the 
results below (the fourth one is defined only for a particular class of weights). 
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They are defined as follows and computed for classical resamplings in Tab.Q] 
A w :=E\Wx-W\ (2) 



B 



w 



E 



\±(W,-W)- 

. i=l 



c 



?? 



-E 



(Wi - wy 



(3) 



(4) 



LV := a + E | XV - x Q \ if Vz, | Wi - x \ = a a.s. (with a > 0, x G E) . (5) 

Note that under our assumptions, these quantities are positive. Moerover, 
if the weights are i.i.d., Cw = Var(Wi)2. We can now state the main result 
of this section: 

Theorem 2.1. Fix a G (0,1) and p G [l,oo]. Let <f) : R K -> R 6e any 

function subadditive, positive-homogeneous and bounded by the p-norm, and 
let W be a resampling weight vector. 



1. IfY satisfies (GA), then 



(Y-/i) < 



E 



*[w-w] 



B 



+ IHL* (a/2) 



Cm/ 1_ 



(6) 

/joWs with probability at least I — a, where a is the vector [Var 1/2 (Y^)] fc . 
The same bound holds for the lower deviations, i. e. with inequality 
reversed and the additive term replaced by its opposite. 

2. IfY satisfies (BA)(p,M) and (SA), then 



(Y- M )< 



E 



Yi 



w-w\ 



A 



2M r — — 

+ — Vlog(l/a) 



holds with probability at least 1 — a . If moreover the weights satisfy the 
assumption of (jHj), then 



(¥-//)> 



E 



w-w 



M 



1 + 



.4 



rp- 



V21og(l/a) 



/lolds wii/i probability at least 1 — a . 
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If there exists a deterministic threshold t a such that P(0 (Y — /x) > t a ) < 
a, the following corollary establishes that we can combine the above concen- 
tration threshold with t a to get a new threshold almost better than both. 

Corollary 2.2. Fix a, 5 G (0,1), p E [1, oo] and tofce and as m Theorem 
\2.1i Suppose that Y satisfies (GA) and that t a ^\-S) is a rea l number such 
that P ((p (Y — n) > t a (i-s)) — Oi(l — 5). Then with probability at least 1 — a, 
4> (Y — /x) is upper bounded by the minimum between t a (i-g) an d 



E 



Y \w-W^ I |Y 



,|cr|| a / q,(i — IMI C w j f a 5\ 

+ -7=^ o + — 7E — $ hr • ( 7 ) 



-Bvy V 2 / V 2 

Remark 2.3. i. Corollary \2.2\ is a consequence of the proof of Theo- 
rem \2.1l rather than of the theorem itself. The point here is that 
E (t> ( Y[„, wi ) |Y is almost deterministic, because it concentrates at 



Y [w-W]) l Y 
the rate n 1 (= o(n 1 ^ 2 )). 

2. For instance, if = sup(-) fresp. sup Corollary \2.2\ may be applied 
with t a equal to the classical Bonferroni threshold for multiple testing 
(obtained using a simple union bound over coordinates) 

t Bonf , a := ^ NL*- 1 (£) {resp. t' Bonf>a := ± \\a\L ^ (^) ) 

We thus obtain a confidence region almost equal to Bonferroni's for 
small correlations and better than Bonferroni 's for strong correlations 
(see simulations in Section^. 

The proof of Theorem 12.11 involves results which are of self interest: the 
comparison between the expectations of the two processes E <p ^Y^ ^j j | Y 

and (Y — /i) and the concentration of these processes around their means. 
This is examinated in the two following subsections. The last subsection 
gives some elements for a wise choice of resampling weight vectors among 
several classical examples. 



2.1 Comparison in expectation 

In this section, we compare E0 ^Y^^H and E0 (Y — /x). We note that 

these expectations exist in the Gaussian and the bounded case provided 
that is measurable and bounded by a p-norm. Otherwise, in particular in 
Propositions 12.41 and 12.61 we assume that these expectations exist. In the 
Gaussian case, these quantities are equal up to a factor that depends only 
on the distribution of W: 
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Proposition 2.4. Let Y be a sample satisfying (GA) and W a resampling 
weight vector. Then, for any measurable positive-homogeneous function (j) : 
R A — > R ; we have the following equality 

B w E<f> ( Y - fi) = E0 (Y [w _ w] ) . (8) 

Remark 2.5. 1. In general, we can compute the value of By/ by simula- 
tion. For some classical weights, we give bounds or exact expressions 
in Tab. H 

2. In a non-Gaussian framework, the constant Bw is still relevant, at least 
asymptotically: in their Theorem 3.6.13, Van der Vaart and Wellner 
IVdVW96\/ use the limit of Bw when n goes to infinity as a normalizing 
constant. 

When the sample is only symmetric we obtain the following inequalities : 

Proposition 2.6. LetY be a sample satisfying (SA), W a resampling weight 
vector and : R K — > R any subadditive, positive-homogeneous function. 

(i) We have the general following lower bound : 

A W E<P (Y - fi) < E<j> (Y [w _ w] ) . (9) 

(it) Moreover, if the weights satisfy the assumption of (jSJ), we have the 
following upper bound 

D W E4> (Y - fi) > (Y [iy _ w] ) . (10) 

Remark 2.7. 1. The bounds (jHJ) and (fTUl ) are tight for Rademacher and 
Random hold-out (n/2) weights, but far less optimal in some other cases 
like Leave-one-out (see Section [2~3}) . 

2. When Y is not assumed to be symmetric and W = 1 a.s., Proposition 
2 in lFro04]/ shows that (jHJ) holds with E(Wi - W)+ instead of A w . 
Therefore, the symmetry of the sample allows us to get a tighter result 
(for instance twice sharper with Efron or Random hold- out (q) weights). 
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2.2 Concentration around the expectations 

In this section we present concentration results for the two processes <f> (Y — ji) 



and E 



Y 



in the Gaussian framework. 



Proposition 2.8. Let p G [l,+oo], Y a sample satisfying (GA) and let a 
be the vector [Var 1 ^ 2 (Y^.)] fc . Let <p : R K — > M be any subadditive function, 
bounded by the p-norm. 

(i) For all a G (0, 1), with probabilty at least 1 — a the following holds: 



(Y-n)< E</> (Y - n) 



W\\ v ^\a/2) 



ill) 



n 



and the same bound holds for the corresponding lower deviations. 

(ii) Let W be some exchangeable resampling weight vector. Then, for all 
a G (0,1), with probabilty at least I — a the following holds: 



E 



w-w 



< Ec 



w-w 



\a\LC w $ (a/2) 



(12) 



and the same bound holds for the corresponding lower deviations. 

The first bound ( TTT1) with a remainder in n~ 1 / 2 is classical. The last 
one (TT2l) is much more interesting since it enlights one of the key prop- 
erties of the resampling idea: the "stabilization". Indeed, the resampling 
quantity E <fi (^[w-Wl) ^ concentrates around its expectation at the rate 

C\yn~ l = o (n -1 / 2 ) for most of the weights (see Section 12.31 and Tab. Q] for 
more details). Thus, compared to the original process, it is almost determin- 
istic and equal to /ArEo (Y — fj) . 

Remark 2.9. Combining expression ([Sj) and Proposition \2.8\ (ii), we derive 
that for a Gaussian sample Y and any p G [1, oo], the following upper bound 
holds with probability at least 1 — a : 



E Y 













E 




Yj- iy _ iy j 


p 


Y 



B 



+ 



\ a \\ v Cw— -i 



w 



nB 



$ (a/2) 



(13) 



w 



and a similar lower bound holds. This gives a control with high probability of 



the L p -risk of the estimator Y of the mean /i G K. at the rate CwB w n 
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Efron 
Efr., n — > +00 



2{l-l) n = A W <B W < 
2 



n—l 
n 

< A w < B w <1=C 



a 



IV 



IV 



Rademacher 
Rad., n — > +00 



1-i 

n 



A 



IV 



5 



IV 



Cw - 
Cw = D w 



R. h.-o. (g) 

R. h.-o. (g) 
R. h.-o. (n/2) (2|n) 
Leave-one-out 



A w = 2 (1 - i) B 



IV 



— - 1 

q 



c 



IV 

A 



n-l\/ g 



- Tq + 



1 - — 

2g 



IV 



B 



IV 



! = A w < B 



>w 



1 



c 



w 



V n-l 

Cw = ^r D 



n—l 



Table 1: Resampling constants for classical resampling weight vector. 



2.3 Resampling weight vectors 

In this section, we consider the question of choosing some appropriate resam- 
pling weight vector W when using Theorem 12.11 or Corollary I2.2L We define 
the following classical resampling weight vectors: 

1. Rademacher: Wi i.i.d. Rademacher variables, i.e. Wi G { — 1,1} with 
equal probabilities. 

2. Efron: W has a multinomial distribution with parameters (n; n -1 , . . . , n^ 1 ). 

3. Random hold-out (q) (R. h.-o.), q G {1, . . . ,n}: Wi = where J 
is uniformly distributed on subsets of {1, . . . , n} of cardinality g. These 
weights may also be called cross validation weights, or leave- (n — q) -out 
weights. A classical choice is q = n/2 (when 2|n). When q = n — 1, 
these weights are called leave-one-out weights. 

For these classical weights, exact or approximate values for the quantities 
A w , B Wl Cw and D w (defined by equations j2j) to (JHJ)) can be easily derived 
(see Tab. [TJ. However, an exact computation of the resampling estimates 

E <f) \ Y^ [r _y^ J |Y using these weights would be time-consuming when n is 

large. The more standard way to solve this problem is to compute resampling 
quantities by Monte-Carlo simulations, i.e. picking up a small number of 
weight vectors (see [Hal92j, appendix II for a discussion). But we did not yet 
investigate the analysis of the corresponding thresholds. 

Another way to solve this computation time problem is to consider a 
regular partition (Bj)i<j<v of {l,...,n} (where V G {2,...,n} and V\n), 
and to define the weights Wi = y^jti^ Bj with J uniformly distributed on 
{1, . . . , V}. These weights are called the (regular) y-fold cross validation 
weights (V-f. c.v.), which are no longer exchangeable but still "piece- wise 
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exchangeable". Considering the process (Y^)i<j<K where Y J = ^ J2ieB ^ 
is the empirical mean of Y on block Bj , we can show that Theorem I2.1l can 
be extended to (regular) V^-fold cross validation weights with the following 
resampling constants: 

Aw = ^\ B w = ^J__ ; C w = V^(V - I)' 1 ; D w = 1 . 

When V does not divide n and the blocks are no longer regular, Theorem 12. II 
can also be generalized, but the constants have more complex expressions. 
Note that in the Gaussian framework of (TT3l) . V^-fold cross-validation 

weights approximate the estimation risk E || Y — by YlJ=i Y^~^ — S 
where Y* - -^ is the mean of the (Y e )i^j ; which bears a strong analogy with 



the usual cross-va 
estimator - V? , 



idation philosophy. Actually, the "classical" leave-one-out 
Y<H) — \ 



approximates a different quantity, the pre- 

p 

diction risk E II Y — Y n+1 || for a new independent vector Y n+1 . However, 



ip 



under (GA) the two types of risk are proportional, \/n + IE Y — 
E || Y — Y n+1 || p ; taking into account this scaling we conclude that our esti- 
mator (with V = n) coincides with the classical leave-one-out (up to the fac- 
tor a/1 — 1/n 2 ~ 1). To guide our choice for a specific resampling scheme, the 
first comparison point is that t a ,w(Y) should be an accurate upper bound of 
the ideal threshold. Under the Gaussian assumption, in view of (jBJ), CwB^ 
appears as a relevant accuracy index for t at w However, a second compari- 
son point is the price of an exact computation of t a ^ in practice. Since one 
must consider each possible weight vector to compute exactly the threshold, 
we use the cardinality of the support of C{W) as a complexity index. 

As shown in Tab.EJ there is an accuracy- complexity trade-off for choosing 
the weights. Since for all exchangeable weights Cy/B^ > yn/Jn — 1), R- h.- 
o.(n/2) and leave-one-out weights are optimal for accuracy (Rademacher and 
Efron being "almost optimal"). On the other hand, l^-fold c.-v. is less accu- 
rate, losing a factor {n — 1)/(V — 1). On the computational viewpoint, the 
leave-one-out is the only reasonable exchangeable procedure (at least when 
n and K are large), and V-i. c.v. looks even more attractive. Considering 
that t a:W involves the sum of terms of order CwB^n~ l and n~ 1//2 , the best 
choice of V should be rather small for most applications. We do not give 
here any universal optimal V since it does not exist, but we suggest to use 
Tab. [2] to choose it. 
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Resampling 


CwB u } (accuracy) 


Card (supp £(W0) (complexity) 


Efron 


<l(l-^)" n >i 

z v u/ n— >oo z 




R a d em a ch er 


< (1 n- l / 2 Y l >1 

n—>oo 


2 n 


R. h.-o. (n/2) 


V n-1 


L/ 2 ) oc n-V"2» 


Leave-one-out 




n 


regular V^-fold c.-v. 


/ n 
~~ V V-l 





Table 2: Choice of the resampling weight vectors : accuracy-complexity 
tradeoff. 

3 Confidence region using resampled quantiles 

In the previous section we have shown how to derive non-asymptotic confi- 
dence regions for the mean of a Gaussian (resp. bounded) vector with un- 
known correlation structure; for this we used a concentration property of the 
quantities <fi(Y— u) and E 



[W 



-W])|Y 



around their mean. The Gaussian 



(resp. McDiarmid's) concentration property allowed us to bound deviations 
from this mean by the deviations of a suitably scaled normal (resp. subgaus- 
sian) variable. Through this approach, the level of the confidence region is 
rigorously controlled for any fixed sample size. 

However, the obtained confidence regions are somewhat unsatisfying be- 
cause they appear to be too conservative in practice. The principal reason 
for this is that 0(Y — u) is of course not a Gaussian variable (even when Y 
is) . Therefore, in spite of the power of the Gaussian concentration property, 
using Gaussian tails as a bound for the deviations of the above non-Gaussian 
variable must necessarily result in losing some slack. 

On the other hand, in most applications of resampling procedures, it 
is common to estimate the quantiles of a variable like <\> ( Y — u) by the 

quantiles of the corresponding resampled distribution C {^]^-w\j 1^)' 
and to use these quantiles to construct a confidence region. Again, while 
many asymptotic results are available to justify this method (for instance 
|VdVW96] ). our goal here is to derive a non-asymptotic region based on a 
similar approach for which the confidence level is proved to hold for any fixed 
sample size. 

For this we apply a principle that is close in spirit to exact tests, i.e. 
by taking advantage of an invariance property (here symmetry around the 
mean) of the initial distribution and using a resampling scheme that respects 
this invariance. For this reason the scope of the current section is far less 
general: instead of covering generic resampling weights, we only consider the 
particular Rademacher resampling scheme. Let us define for a function the 
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resampled empirical quantile: 

q a ((j),Y) = inf |i6l s.t. F w [(j)(Y [w] ) > x] < a} , 

wherein W is an i.i.d Rademacher weight vector. We now state the main 
technical result of this section: 

Proposition 3.1. Fix 5, a e (0,1). Let Y be a data sample satisfying as- 
sumption (SA). Let f : (R^) — > [0, oo) be a nonnegative (measurable) 
function on the set of data samples. Let (J) be a nonnegative, subadditive, 
positive-homogeneous function. Denote 4>(x) = max (<j)(x), (f)(— x)) . Finally, 
for r) e (0, 1) , denote 

B(n,r)) = min j k G {0, . . . , n} s.t. 2~ n ^ < rj 

the upper quantile function of a binomial (n, \) variable. Then we have: 
P [0(Y - > q a{1 _s) (<P, Y — Y) + f(Y)} 

— 77 

<a + F 0(Y-/i)> f(Y) 

2B (n, ^j—n 

. 1/2 

Remark 3.2. 5y Hoeffding's inequality, 2g ^^_ n > ( 2 i n (%) 

By iteration of this proposition we obtain the following corollary: 

Corollary 3.3. Fix J a positive integer, (ai)i=o,...,j-i a finite sequence in 
(0, 1) and (3,5 £ (0, 1) . Let Y be a data sample satisfying assumption (SA). 
Let(f) : M. K — > R be a nonnegative, subadditive, positive-homogeneous function 
and f : (R x ) n — > [0, oo) be a nonnegative function on the set of data samples. 
Then the following holds: 



P 



j-i 



0(Y - /i) > ?(1 _ 5)ao (0, Y - Y) + 7^(i-*K (0, Y - Y) + 7j /(Y) 

i=i 
j-i 

<^a, + pU(Y-/i)>/(Y)j , (14) 



i=0 

k-i 



where, for k > 1, ^ k = n k J^J ( 2B I n, 

i—n \ \ 



— ( OLib 

2 " 
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The rationale behind this result is that the sum appearing inside the 
probability should be interpreted as a series of corrective terms of decreasing 
order of magnitude, since we expect the sequence ■jk to be sharply decreasing. 
Looking at Hoeffding's bound, this will be the case if the levels are such that 
«i ^> exp(— n) . 

Then comes the remaining issue of the trailing term on the right-hand- 
side. While it is tempting to think that it would be possible to obtain a self- 
contained result based on the symmetry assumption (SA) alone, we did not 
succeed in this direction. To upper-bound the trailing term, we can assume 
some additional regularity assumption on the distribution of the data. For 
example, if the data are Gaussian or bounded, we can apply the results in 
the previous section (or apply some other device like Bonferroni's bound (jHJ)). 
The point is that this bound does not have to be particularly sharp, since 
we expect (in favorable cases) the trailing probability term on the right-hand 
side as well as the contribution of 7j/(Y) to the left-hand side to be almost 
negligible. 

It seems plausible that at least a minor regularity assumption (suppos- 
edly significantly weaker than assuming a Gaussian distribution or bounded 
data) is actually a necessary condition in addition to (SA) to obtain a self- 
contained bound and ensure that nothing pathological happens with the 
extreme quantiles, but this remains as an interesting open issue. 

As before, for computational reasons, it might be relevant to consider a 
block-wise Rademacher resampling scheme. 

4 Simulations 

For simulations we consider data of the form Y t = \i t + G t , where t be- 
longs to an m x m discretized 2D torus of K = m 2 "pixels", identified with 

= (Z/mZ) 2 , and G is a centered Gaussian vector obtained by 2D discrete 
convolution of an i.i.d. standard Gaussian field ("white noise") on with a 
function F : — > R such that ^ tgT 2 F 2 (t) = 1 . This ensures that G is a 
stationary Gaussian process on the discrete torus, it is in particular isotropic 
with E [G 2 ] = 1 for all teT 2 m . 

In the simulations below we consider for the function F a "Gaussian" 
convolution filter of bandwith b on the torus: 

F b (t) = C b exp(-d(0,t) 2 /b 2 ) , 

where d(t, t') is the standard distance on the torus and Cf, is a normalizing 
constant. Note that for actual simulations it is more convenient to work in 
the Fourier domain and to apply the inverse DFT which can be computed 
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Gaussian kernel convolution, K=128x128, n=1000, level 5% 

Quant. +conc. 

Quant.+bonf. 

Cone. 

Bonferroni 

*■■•.., min(conc.,bonf.) 

■<N ''■ Est. true quantile 

XX I Single test 



5 10 15 20 25 30 35 40 
Convolution kernel width (pixels) 

Figure 1: Left: example of a 128x128 pixel image obtained by convolution 
of Gaussian white noise with a (toroidal) Gaussian filter with width b = 18 
pixels. Right: average thresholds obtained for the different approaches, see 
text. 

efficiently. We then compare the different thresholds obtained by the methods 
proposed in this work for varying values of b . Remember that the only 
information available to the algorithm is the bound on the marginal variance; 
the form of the function Fj, itself is of course unknown. 

On Fig. [H we compare the thresholds obtained when = sup | • | , which 
corresponds to the two-sided multiple testing situation. We use the different 
approaches proposed in this work, with the following parameters: the dimen- 
sion is K = 128 2 = 16384 , the number of data points per sample is n = 1000 
(much smaller than K, so that we really are in a non-asymptotic frame- 
work), the width b takes even values in the range [0,40] , the overall level is 
a = 0.05 . For the concentration threshold ([61) ('cone.'), we used Rademacher 
weights. For the "compound" threshold of Corollary 12.21 ('min(conc.,bonf.)'), 
we used (5 = 0.1 and the Bonferroni threshold tB On f .9a. as the determinis- 
tic reference threshold. For the quantile approach (TT4D . we used J = 1 , 
a = 0.9a, 5 = 0.1 , and the function / is given either by the Bonferroni 
threshold ('quant.+bonf.') or the concentration threshold ('quant. +conc.'), 
both at level 0.1a. Each point represents an average over 50 experiments. 
Finally, we included in the figure the Bonferroni threshold t' Bonf a , the thresh- 
old for a single test for comparison, and an estimation of the true quantile 
(actually, an empirical quantile over 1000 samples). 

The quantiles or expectation with Rademacher weights were estimated 
by Monte-Carlo with 1000 draws. On the figure we did not include standard 
deviations: they are quite low, of the order of 10~ 3 , although it is worth 
noting that the quantile threshold has a standard deviation roughly twice 
as large as the concentration threshold (we did not investigate at this point 
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what part of this variation is due to the MC approximation). 

The overall conclusion of this preliminary experiment is that the different 
thresholds proposed in this work are relevant in the sense that they are 
smaller than the Bonferroni threshold provided the vector has strong enough 
correlations. As expected, the quantile approach appears to lead to tighter 
thresholds. (However, this might not be always the case for smaller sample 
sizes.) One advantage of the concentration approach is that the 'compound' 
threshold Q can "fall back" on the Bonferroni threshold when needed, at 
the price of a minimal threshold increase. 



5 Proofs 



Proof of Proposition \2.4\ Denoting by I] the common covariance matrix of 
the Y\ we have C(Y^ W _ W ^\W) = (n" 1 T,Li( W i - W) 2 ) 1/2 Af(0, n^E), and 

the result follows because £(Y— //) = Af(0, n _1 S) and is positive-homogeneous. 

□ □ 

Proof of Proposition ^. 6[ (i). By independence between W and Y, using the 
positive homogeneity, then convexity of (p, for every realization of Y we have: 



A w <j) (Y - //) 



E 



< E 



-Y,\ w *- w \( Yi -») y 

i=l J / 



8=1 



We integrate with respect to Y, and use the symmetry of the Y* with 
respect to /i and again the independence between W and Y to show finally 
that 



A W E [<jy (Y - fj)] < E 



1 n — 



E 



i=l 



J2( Wl -W) (Y*-/i) 



E 



\w-w 



We obtain (ii) via the triangle inequality and the same symmetrization trick. 

□ □ 

Proof of Proposition We denote by A a square root of the common co- 
variance matrix of the Y* and by (ak)i<k<K the rows of A. If G is a K x m 
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matrix with standard centered i.i.d. Gaussian entries, then AG has the same 
distribution as Y-fi . We let for all C G 71(0 := <p (± £" =1 A Ci) and 

7^(0 := ^0 X/i^=i(^i ~ W)AQ). From the Gaussian concentration theo- 
rem of Cirel'son, Ibragimov and Sudakov (see for example |Mas05| . Theorem 
3.8), we just need to prove that 71 (resp. T 2 ) is a Lipschitz function with 
constant (resp. Cw/n), for the Euclidean norm ||-|| 2 i<r n on 

(M* - )". Let G (M^) n . Using Cauchy-Schwartz's inequality coordinate- 
wise and ||a&|| 2 = a^, we deduce 



171(0-^(01 < 



1 n 

-E A (c*-c 

i=i 



< Hal 



1 - 

-E(c*-c 

i=i 



Therefore, we get \T X {Q -T x {(')\ < ||C _ C'L.^n D ^ convexity of x e 
R K — > || x || 2, and we obtain (i). For T 2 , we use the same method as for T x : 



|T 2 (C)-T 2 (C')| < ikii p i 



1 n 

i=i 



\ 



E 



i=l 



(15) 



We now develop ||^™ =1 (Wi — W)(Q — C0|| 2 m ^ ne Euclidean space R (note 
that from (EIUW ~ = °i we have E(Wi - W) (W 2 - W) = -C^/n) 



E 



i=l 



- 1/n) E HCt - C 



i=l 



,2 _ ( ^w_ 

12 n 



#3 



cl 



i=i 



£(c< - CO 



i=l 



Consequently, 

n. 

E (w* - ^) (c* - CD 



E 



i=l 



^^Eiic^-c'ii^^iic-c'iiU- ( 16 ) 



i=l 



Combining expression (TT5l) and (TT6l) . we find that T 2 is || a || CV/n-Lipschitz. 

□ □ 
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Proof of Theorem \2.1i The case (BA) (p, M) and (SA) is obtained by com- 
bining Proposition E3] and McDiarmid's inequality (see for instance |Fro04| ). 
The (GA) case is a straightforward consequence of Proposition 12.41 and the 
proof of Proposition 12.81 □ □ 



Proof of Corollary \2.2l From Proposition 12.81 (i), with probability at least 
1— of (1—5), (f) (Y — p) is upper bounded by the minimum between t a (i_s) and 

E0 (Y — fi) + ^ piS> 5 ^ 2 ^ (because these thresholds are deterministic). In 
addition, Proposition 12.41 and Proposition 12.81 (ii) give that with probability 
at least 1 - a8, E0 (Y - p) < E (H^)l Y ) + tM^L$-\ aS / 2 ). The result 
follows by combining the two last expressions. □ □ 



Proof of Proposition lff.il Remember the following inequality coming from 
the definition of the quantile q a : for any fixed Y 



F w [0 (Y m ) > q a ((f), Y)] < a < F w [0 (Y [w] ) > q a 
which will be useful in this proof. We have 

P Y [<f>(Y -p)> q a (<f>, Y - fj,)] = E w [P Y 

= E Y \F W 
< a. 



Y ~ V)[ W ]) > ?. 



> Qa 



(17) 

} ; (Y - fi)[w] 
0,Y-//)]" 
(18) 



The first equality is due to the fact that the distribution of Y satisfies as- 
sumption (SA), hence the distribution of (Y — p) invariant by reweight- 
ing by (arbitrary) signs W G {—1,1}™. In the second equality we used 
Fubini's theorem and the fact that for any arbitrary signs W as above 
q a ((p, (Y — n)[w]) — Q.a{4>i Y — p) ; finally the last inequality comes from 
(fT7l) . Let us define the event 

Q={Y s.t. g a (^Y-^)< 9aM (0,Y-Y) + /(Y)} ; 

then we have using ( fTBl : 

P[0(Y-/i) > 9aM (0,Y 



Y) + /(Y)] < P [<f>{Y - fi) > q a ( 
< a + F[YeQ c ] . 



fi)] + P [Y G n c ] 
(19) 



We now concentrate on the event Q c . Using the subadditivity of 
the fact that (Y — p)r w -i 



and 

Y — Y)™ + W(Y — p) , we have for any fixed 
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Y e tt c : 

a<¥ w [cj)((Y - fi) [w] ) > g a (0, Y — /i) 

< P w \<f>((Y - v) [w] ) > <?«(!-*)(& Y - Y) + /(Y) 

' + P W [0(W r (Y-//))>/(Y)] 



< P 



ir 



</>((Y-Y) [VF] )>g a(1 _ 5) (0,Y 



< a(l - 5) + ¥ w [<P(W(Y - /i)) > f(Y)] . 

For the first and last inequalities we have used (JTTj) , and for the second 
inequality the definition of fl c . From this we deduce that 

tt c C {Y s.t. F w [<p(W(Y - fi)) > f(Y)] > a5} . 

Now using the homogeneity of 0, and the fact that both <p and / are non- 
negative: 



F w [4>(W(Y-n))> f(Y)] =F W 

<Piy 
= 2P 



\W\ > 

\w\ > 



/(Y) 



(sign(W)(Y-/x)) 
/(Y) 



(Y-ai) 



i(2B Bl -n)>^PQ- 
n 0(Y-/i) 



where 5 n 1 denotes a binomial (n, |) variable (independent of Y). From the 
two last displays we conclude 



Q c C <Y s.t. <p(Y -fi)>^ 



n 



n 



2B (», f ) - 

which, put back in f fT9l) . leads to the desired conclusion. 



/(Y) 



□ 



□ 
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